A Method for Automating Text Markup
نویسندگان
چکیده
Markup languages based on XML are increasingly popular, and languages for other formats such as RDF are under active development. One of the problems involved in converting legacy documents to use XML or other markup formats is the insertion of tags into the document and the consequent rearrangement of text required when markup is added to an existing, un-marked-up document. This paper describes a method for automating part of the process of marking up such legacy documents. The approach is designed for semi-structured text documents, for example, technical documentation and narrative descriptions.
منابع مشابه
Automating XML markup of text documents
We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Self-Organizing Map (SOM) algorithm to arrange XML marked-up documents on a twodimensional map so that similar documents appear closer to each other. It then employs an inductive learning algorithm C5 to automatically ex...
متن کاملAutomating XML Markup using Machine Learning Techniques
In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algori...
متن کاملFrom XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers
We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These were developed during our ongoing work on automating the markup of scanned copies of the biodiversity literature. Such automation is required if historic literature is to be used to inform contemporary issues in biodiversity research. We consider a...
متن کاملModel annotation for synthetic biology: automating model to nucleotide sequence conversion
MOTIVATION The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dyn...
متن کاملمدل سازی شوک های مارک آپ با استفاده از مدل DSGE (مورد ایران)
This paper investigates the effects of markup shocks of domestic and export goods prices on macroeconomic variables by using a Dynamic Stochastic General Equilibrium (DSGE) model for Iran, in order to examine the effect of the growth of market power and monopoly in domestic and exporting markets from a macroeconomic viewpoint. To this end, the optimal pricing process of domestic, importing and ...
متن کامل